AITopics | support constraint

Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning

Neural Information Processing SystemsJun-11-2026, 06:36:06 GMT

Offline reinforcement learning (RL) suffers from extrapolation errors induced by out-of-distribution (OOD) actions. To address this, offline RL algorithms typically impose constraints on action selection, which can be systematically categorized into density, support, and sample constraints. However, we show that each category has inherent limitations: density and sample constraints tend to be overly conservative in many scenarios, while the support constraint, though least restrictive, faces challenges in accurately modeling the behavior policy. To overcome these limitations, we propose a new neighborhood constraint that restricts action selection in the Bellman target to the union of neighborhoods of dataset actions. Theoretically, the constraint not only bounds extrapolation errors and distribution shift under certain conditions, but also approximates the support constraint without requiring behavior policy modeling. Moreover, it retains substantial flexibility and enables pointwise conservatism by adapting the neighborhood radius for each data point. In practice, we employ data quality as the adaptation criterion and design an adaptive neighborhood constraint. Building on an efficient bilevel optimization framework, we develop a simple yet effective algorithm, Adaptive Neighborhood-constrained Q learning (ANQ), to perform Q learning with target actions satisfying this constraint. Empirically, ANQ achieves state-of-the-art performance on standard offline RL benchmarks and exhibits strong robustness in scenarios with noisy or limited data.

constraint, machine learning, reinforcement learning, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

cdda0657a9f32bc7ddd4343686e7371e-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 04:41:39 GMT

The optimization phase in deep learning consists in minimizing an objective function w.r.t. the set of

artificial intelligence, machine learning, neural network, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

caa934a507a952698d54efb24845fc4b-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 22:21:25 GMT

algorithm, behavior policy, constraint, (11 more...)

Neural Information Processing Systems

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.68)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

ReDS: Offline RL With Heteroskedastic Datasets via Support Constraints

Neural Information Processing SystemsDec-25-2025, 02:01:35 GMT

Offline reinforcement learning (RL) learns policies entirely from static datasets. Practical applications of offline RL will inevitably require learning from datasets where the variability of demonstrated behaviors changes non-uniformly across the state space. For example, at a red light, nearly all human drivers behave similarly by stopping, but when merging onto a highway, some drivers merge quickly, efficiently, and safely, while many hesitate or merge dangerously.

behavior policy, heteroskedastic dataset, name change, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.42)

Add feedback

Adversarially and Distributionally Robust Virtual Energy Storage Systems via the Scenario Approach

Pantazis, Georgios, Mignoni, Nicola, Carli, Raffaele, Dotoli, Mariagrazia, Grammatico, Sergio

arXiv.org Artificial IntelligenceNov-13-2025

We propose an optimization model where a parking lot manager (PLM) can aggregate parked EV batteries to provide virtual energy storage services that are provably robust under uncertain EV departures and state-of-charge caps. Our formulation yields a data-driven convex optimization problem where a prosumer community agrees on a contract with the PLM for the provision of storage services over a finite horizon. Leveraging recent results in the scenario approach, we certify out-of-sample constraint safety. Furthermore, we enable a tunable profit-risk trade-off through scenario relaxation and extend our model to account for robustness to adversarial perturbations and distributional shifts over Wasserstein-based ambiguity sets. All the approaches are accompanied by tight finite-sample certificates. Numerical studies demonstrate the out-of-sample and out-of-distribution constraint satisfaction of our proposed model compared to the developed theoretical guarantees, showing their effectiveness and potential in robust and efficient virtual energy services.

artificial intelligence, constraint, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2511.09427

Country: Europe > Netherlands (0.14)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Electric Vehicle (1.00)
Automobiles & Trucks (1.00)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

Add feedback

Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning

Mao, Yixiu, Qu, Yun, Wang, Qi, Ji, Xiangyang

arXiv.org Artificial IntelligenceNov-5-2025

Offline reinforcement learning (RL) suffers from extrapolation errors induced by out-of-distribution (OOD) actions. To address this, offline RL algorithms typically impose constraints on action selection, which can be systematically categorized into density, support, and sample constraints. However, we show that each category has inherent limitations: density and sample constraints tend to be overly conservative in many scenarios, while the support constraint, though least restrictive, faces challenges in accurately modeling the behavior policy. To overcome these limitations, we propose a new neighborhood constraint that restricts action selection in the Bellman target to the union of neighborhoods of dataset actions. Theoretically, the constraint not only bounds extrapolation errors and distribution shift under certain conditions, but also approximates the support constraint without requiring behavior policy modeling. Moreover, it retains substantial flexibility and enables pointwise conservatism by adapting the neighborhood radius for each data point. In practice, we employ data quality as the adaptation criterion and design an adaptive neighborhood constraint. Building on an efficient bilevel optimization framework, we develop a simple yet effective algorithm, Adaptive Neighborhood-constrained Q learning (ANQ), to perform Q learning with target actions satisfying this constraint. Empirically, ANQ achieves state-of-the-art performance on standard offline RL benchmarks and exhibits strong robustness in scenarios with noisy or limited data.

constraint, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2511.02567

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

cdda0657a9f32bc7ddd4343686e7371e-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 07:50:21 GMT

The optimization phase in deep learning consists in minimizing an objective function w.r.t. the set of

artificial intelligence, machine learning, neural network, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

6692e1b0e8a31e8de84bd90ad4d8d9e0-Paper-Conference.pdf

Neural Information Processing SystemsSep-27-2025, 19:18:48 GMT

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > Canada (0.68)
North America > United States > Massachusetts (0.28)
North America > United States > California (0.28)

Industry: Energy > Oil & Gas (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

caa934a507a952698d54efb24845fc4b-Paper-Conference.pdf

Neural Information Processing SystemsAug-18-2025, 22:43:58 GMT

constraint, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.68)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

ReDS: Offline RL With Heteroskedastic Datasets via Support Constraints

Neural Information Processing SystemsJan-17-2025, 14:18:18 GMT

Offline reinforcement learning (RL) learns policies entirely from static datasets. Practical applications of offline RL will inevitably require learning from datasets where the variability of demonstrated behaviors changes non-uniformly across the state space. For example, at a red light, nearly all human drivers behave similarly by stopping, but when merging onto a highway, some drivers merge quickly, efficiently, and safely, while many hesitate or merge dangerously. Both theoretically and empirically, we show that typical offline RL methods, which are based on distribution constraints fail to learn from data with such non-uniform variability, due to the requirement to stay close to the behavior policy to the same extent across the state space. Ideally, the learned policy should be free to choose per state how closely to follow the behavior policy to maximize long-term return, as long as the learned policy stays within the support of the behavior policy.

behavior policy, heteroskedastic dataset, support constraint, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Add feedback

Filters

Collaborating Authors

support constraint

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning

cdda0657a9f32bc7ddd4343686e7371e-Paper-Conference.pdf

caa934a507a952698d54efb24845fc4b-Paper-Conference.pdf

ReDS: Offline RL With Heteroskedastic Datasets via Support Constraints

Adversarially and Distributionally Robust Virtual Energy Storage Systems via the Scenario Approach

Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning

cdda0657a9f32bc7ddd4343686e7371e-Paper-Conference.pdf

6692e1b0e8a31e8de84bd90ad4d8d9e0-Paper-Conference.pdf

caa934a507a952698d54efb24845fc4b-Paper-Conference.pdf

ReDS: Offline RL With Heteroskedastic Datasets via Support Constraints